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Abstract.  A  number  of  conjugate  gradient  methods  are  considered  for  a  class  of  linear  systems  of 
real  algebraic  equations.  This  class  includes  all  symmetric  and  certain  special  nonsymmetric  problems, 
which  give  rise  to  three-term  recursions.  All  the  algorithms  are  characterized  variationally.  This 
makes  it  possible  to  derive  error  estimates  systematically  in  terms  of  certain  polynomial  approximation 
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1.  Introduction.  It  is  the  purpose  of  this  survey  article  to  present  some  standard 
and  not-so-standard  conjugate  gradient  methods  in  a  unified  way.  We  consider  iterative 
methods  for  solving  n  x  n  linear  systems  of  real  algebraic  equations 

(1)  Ax  =  b. 

We  will  focus  our  attention  on  general  symmetric,  not  necessarily  definite,  and  special 
nonsymmetric  problems  where 

(2)  A  =  I  -  K,     KT  =  -K. 

Throughout  the  paper  /  denotes  the  identity  matrix  and  KT  the  transpose  of  K . 

Many  of  the  results  presented  here  have  been  known  to  the  authors  and  specialists 
in  the  field  since  the  late  seventies  and  most  have  already  appeared  in  the  literature. 
However,  we  know  of  no  previous  survey  covering  all  these  topics  systematically. 

In  their  pioneering  work,  Hestenes  and  Stiefel  [24]  developed  a  theory  for  the  posi- 
tive definite,  symmetric  case.  They  also  noted  that  any  linear  system  with  a  nonsingular 
matrix  can  be  transformed  into  a  positive  definite  system  by  using  the  normal  equa- 
tions. Thus,  in  the  terminology  of  Faddeev  and  Faddeeva  [11],  the  first  (left)  Gauss 
transform  can  be  used  to  obtain 

(3)  ATAx  =  ATb. 

The  matrix-vector  multiplication  required  in  each  iteration  of  a  conjugate  gradient 
method  can  be  carried  out  by  multiplying  the  vector  with  A  and  the  resulting  vector 
by  AT.  The  fact  that  such  a  factored  form  can  be  used  was  noticed  already  by  Hestenes 
and  Stiefel.  The  second  (right)  Gauss  transform 

(4)  AATy    =    b, 

x    =    ATy. 

can  also  be  used  to  transform  (1)  into  a  different  positive,  definite  symmetric  problem. 
For  the  symmetric  and  special  nonsymmetric  cases  considered  in  this  paper,  the  coef- 
ficient matrices  in  (3)  and  (4)  become  A2  and  /  -  A'2,  respectively.  Since  frequently 
the  main  computational  cost  of  a  conjugate  gradient  step  is  the  matrix-vector  multi- 
plication, the  use  a  Gauss  transform  increases  the  cost  per  step  considerably.  If  the 
positive  definite  product  matrix  is  not  formed,  the  cost  per  step  can  be  expected  to 
approximately  double;  if  it  is  formed,  the  product  of  A  and  AT  is  usually  less  sparse 
than  A. 

The  spectrum  of  the  iteration  operator  affects  the  rate  of  convergence  of  the  iterative 
methods;  generally  the  convergence  slows  with  an  increase  in  the  condition  number.  It 
is  therefore  desirable  to  avoid  systems  obtained  by  using  the  Gauss  transforms  since 
the  condition  number  of  A7  A  can  be  much  worse  than  that  of  A.  We  therefore  consider 
conjugate  gradient  type  methods  which  produce  a  new  approximation  after  only  one 
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matrix-vector  multiply  involving  the  original  matrix,  and  with  convergence  rates  that 
depend  on  the  spectrum  of  A  rather  than  AT A.  As  we  will  see,  there  will  be  some 
disappointments. 

In  the  absence  of  roundoff,  the  conjugate  gradient  method  terminates  in  n  steps 
or  less.  Because  of  this  finite  termination  property,  this  method  was  originally  viewed 
as  a  direct  method;  cf.  Hestenes  and  Stiefel  [24].  Reid  [34]  is  credited  with  the  revival 
of  the  algorithm  as  an  iterative  method.  If  the  system  is  well  conditioned,  much  fewer 
than  n  iterations  can  produce  a  very  good  approximation  to  the  solution.  For  a  detailed 
historical  review  of  the  conjugate  gradient  method  for  the  period  1948-1976,  see  Golub 
and  O'Leary  [17]. 

In  this  paper  we  will  not  attempt  to  discuss  the  effect  of  roundoff  on  the  error 
bounds.  We  only  note  that  this  issue  was  first  treated  by  Paige  [30]  and  Reid  [34]  and 
that  recent  results  by  Greenbaum  [19]  shed  new  light  on  the  issue  of  the  reliability  of 
the  error  bounds,  which  are  obtained  while  ignoring  roundoff. 

The  preconditioning  idea,  which  goes  back  at  least  to  Hestenes  [23],  consists  of 
replacing  "the  system  given  by  another  system  with  the  same  solution  but  with  a 
different  coefficient  matrix  A  of  smaller  condition  number",  Rutishauser  [35,  p. 38]. 
Thus,  the  preconditioned  system  is  expected  to  have  better  convergence  properties.  We 
can  use  positive  definite,  symmetric  preconditioned  for  all  the  methods  considered  for 
symmetric  problems  with  virtually  no  change  in  the  theory;  see  further  Sections  2  and 
4. 

Many  methods  of  conjugate  gradient  type  can  be  derived  variationally.  This  can  be 
done  systematically  by  selecting  an  expanding  set  of  subspaces,  known  as  the  Krylov 
subspaces,  and  by  defining  a  sequence  of  approximate  solutions  in  these  subspaces  by 
using  a  Galerkin  condition,  or  by  minimizing  the  norm  of  the  residual.  For  a  special 
choice  of  the  subspace,  an  approximate  solution,  which  minimizes  the  Euclidean  norm  of 
the  error,  can  also  be  computed.  By  exploiting  the  special  form  of  A,  we  show  in  Section 
2  that  the  Krylov  subspaces  can  be  defined  in  terms  of  Lanczos  vectors,  which  satisfy  a 
three-term  recurrence  relationship.  As  a  consequence,  the  iterates  obtained  using  these 
subspaces  satisfy  similar  simple  recursions;  cf.  Section  3.  Faber  and  Manteuffel  [10] 
have  shown  that  the  matrices  for  which  it  is  possible  to  obtain  such  simply  structured 
algorithms  essentially  are  of  the  form  A  =  e'e(T  +  i<xl),  where  i  :=  y/—  1,  6  and  a  are 
real,  and  T  is  Hermitian.  This  paper  covers  these  cases  when  the  matrix  A  is  real. 
An  analysis  of  the  complex  case  can  be  found  in  a  recent  paper  by  Freund  [14].  We 
refer  to  Ashby,  Saylor  and  Manteuffel  [1],  Dennis  and  Turner  [7],  Eisenstat,  Elman  and 
Schultz  [9],  Hageman  and  Young  [21],  Saad  and  Schultz  [37]  and  the  references  given 
therein,  for  a  description  of  methods  for  general  nonsymmetric  problems  for  which  no 
three-term  recurrence  is  possible. 

For  completeness,  we  include  a  derivation  of  the  algorithms  for  the  positive  definite 
case.  We  use  an  approach  quite  similar  to  that  of  Paige  and  Saunders  [31].  This  sets 
the  stage  for  a  description  of  their  algorithms  SYMMLQ  and  MINRES,  which  were 


introduced  in  the  same  paper  to  solve  indefinite  symmetric  problems.  The  main  result 
of  Subsection  3.3  shows  that  the  auxiliary  vectors,  introduced  by  Paige  and  Saunders  in 
the  SYMMLQ  algorithm  in  order  to  make  the  computation  stable,  are  minimum  error 
solutions;  cf.  also  Fletcher  [12].  An  underlying  variational  formulation  therefore  makes 
it  possible  to  derive  error  bounds  for  one  of  the  sequences  of  solutions  provided  by  the 
SYMMLQ  algorithm  using  techniques  very  similar  to  the  standard  case;  cf.  Section  4. 

The  special  nonsymmetric  problems  considered  here  naturally  arise  from  linear 
systems  with  coefficient  matrices  with  a  positive  definite  symmetric  part.  In  some 
applications,  it  is  preferable  to  repeatedly  solve  a  linear  system  with  the  symmetric 
part  as  coefficient  matrix  rather  than  solving  the  original  system  directly.  There  are 
also  interesting  applications  with  a  series  of  problems  with  different  coefficient  matrices 
but  with  the  same  symmetric  part;  cf.  Rapoport  [33].  The  symmetric  part  of  the 
coefficient  matrix  serves  as  a  preconditioner  and  the  problem  is  reduced  to  the  special 
nonsymmetric  form.  We  use  the  splitting 

(5)  A  =  M  -  N, 
where 

(6)  M   :=    {A  +  AT)/2 
is  the  symmetric,  positive  definite  and 

-N   :=    {A-AT)/2 

is  the  skew-symmetric  part  of  A;  see  Concus  and  Golub  [5]  and  Widlund  [46].  The 
operator 

(7)  K    :=    Ar'N 

plays  a  central  role  in  the  algorithm  and  in  our  analysis.  It  is  easy  to  see  that  K  is 
skew-symmetric  with  respect  to  the  M-inner  product  defined  by 

(8)  (u,v)m   :=    (u,Mv)   :=   uTMv. 

The  results  of  the  analysis  of  the  algorithm  obtained  by  using  a  Galerkin  condition, 
known  as  the  CGW  algorithm,  and  of  the  alternative  minimal  residual  algorithm  given 
in  the  thesis  of  Rapoport  [33],  are  disappointing.  As  first  shown  by  Young  [47],  cf. 
Hageman,  Luk  and  Young  [20],  the  CGW  algorithm  is  directly  related  to  a  doubling 
algorithm;  we  show  in  Subsection  3.5  that  the  CGW  algorithm  can  be  obtained  using 
a  Gauss  transform. 

In  the  final  section,  we  collect  the  error  bounds  known  to  us.  The  analysis  is  based 
on  best  polynomial  approximation  problems  on  the  spectrum  of  the  operator  which 
defines  the  Krylov  subspaces.  Such  a  result  is  well  known  for  the  positive  definite 
case;  cf.  Daniel  [6]  and  Luenberger  [28].  The  other  cases  are  now  also  well  understood 
primarily  because  of  the  efforts  of  Freund  [13,  14]  and  Freund  and  Ruscheweyeh  [15]. 
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2.  Krylov  Spaces,  Lanczos  Vectors  and  the  Restriction  of  the  Operators 
to  the  Subspaces. 

2.1.  The  symmetric  case.  The  approximate  solutions  Xk  of  (1)  are  constructed 
using  expanding  affine  spaces  of  the  form  Xo  +  S  ,  which  are  of  dimension  k  =  1,2, . . . 
The  Krylov  subspace,  Sk  =  Sk(A,Vi),  is  spanned  by  the  first  k  vectors  of  the  Krylov 
sequence 

(9)  v1,Av1,...,Ak-1vi,... 

We  note  that  for  any  constant  a,  Sk(A  +  aI,Vi)  =  Sk(A,Vi).  We  also  ignore  the 
normalization  of  the  element  v\  and  regard  Sk(A,  (3vi),  (3  ^  0,  as  the  same  as  Sk(A,  fi). 
The  operator  A  and  the  initial  element  v-i  completely  determine  these  spaces.  The 
Krylov  sequence  (9)  was  apparently  introduced  over  half  a  century  ago  by  Krylov  [25], 
see  also  Faddeev  and  Fadeeva  [11],  but  we  do  not  know  who  first  attributed  Krylov's 
name  to  it. 

In  this  subsection,  we  assume  that  A  is  symmetric.  The  system  (1)  can  be  precon- 
ditioned, from  the  left,  by  a  positive  definite  matrix  M.  Preferably  M  should  be  close 
to  A  in  the  sense  that  the  generalized  eigenvalues  of  the  pencil  defined  by  A  and  M  are 
close  to  each  other;  see  Section  4.  The  transformed  equation, 

(10)  M~1Ax  =  M~1bt 

is  then  solved  using  the  Krylov  subspaces  Sk(M~1A,  Vi).  The  operator  M~lA  is  sym- 
metric with  respect  to  the  the  M-inner  product  (8).  This  inner  product  has  to  be  used 
in  the  formulas  which  provide  the  parameters  for  the  Lanczos  vectors  and  the  itera- 
tive methods,  instead  of  the  Euclidean  inner  product.  Just  as  there  are  left  and  right 
Gauss  transforms,  there  are  left  and  right  preconditioned.  We  can  thus  also  transform 
equation  (1)  into 

AM~ly    =     b, 

where 

x    =    M~ly. 

In  this  case,  the  use  of  the  inner  product  uT M~^v  is  appropriate. 

The  initial  element  V\  of  the  Krylov  sequences,  which  is  relevant  in  this  work,  is 
the  initial  residual 

(11)  r0    :=    b-  Ax0, 

where  x0  is  the  initial  guess,  or  in  the  case  of  a  left  Gauss  transform,  Ar0.  If  a  left  pre- 
conditioner  is  chosen,  t'j  =  il/_1r0  or  M~lAM~lr0.  To  simplify  the  notations,  we  will, 


except  in  the  special  nonsymmetric  case,  only  discuss  the  case  when  no  preconditioner 
is  used. 

The  Lanczos  vectors  Vj,  j  =  1,2, .. .,  provide  an  orthonormal  basis  for  the  subspaces 
Sk;  cf.  Lanczos  [26].  They  are  defined  recursively,  are  unique  up  to  sign  and  are  obtained 
by  a  Gram-Schmidt  orthogonalization  process: 

Normalize  vu 

k 

(12)  /0k+i,JH-iV*+i    :=    Avk-Y,P>,k+\V„        fc  =  l,2,..., 

where  Z^+i^+i  is  chosen  so  that  vk+\  has  unit  norm.  Since  the  vectors  are  orthonormal, 

(13)  fcM1  =  (Avk,Vi),      i<k. 

The  Lanczos  vectors  depend  on  the  operator  A  and  the  direction  of  the  initial  vector. 
They  are  uniquely  defined  by  the  Krylov  sequence  and  the  inner  product  used  in  (13). 

In  the  cases  considered  in  this  paper,  equation  (12)  reduces  to  a  three-term  recur- 
rence similar  to  those  of  the  classical  theory  of  orthogonal  polynomials.  This  means 
that  (3,,k+i  =  0,  i  <  k  —  1.  It  is  convenient  to  change  our  notations  and  let  ak  :=  (3kik+i, 
(3k+i  :=  /3fc+i,*+i,  and  7*  :=  /?jfc_i,fc+i. 

LEMMA  2.1.   Let  A  be  symmetric,  let  v0  :=  0,  let  Uj  be  a  given  unit  vector,  and  let 

(14)  Pk+iVk+i    ■=   Avk  -  ockVk  -  IkVk-i- 

If  ak  :=  (Avk,Vk),  (3k+i  ■=  (Avk,vk+l)  and  fk  :=  (Avk,vk-i),  then  (ufc+1,^)  =  0, 
j  <  k,  i.e.  vk+i±Sk,  for  k  —  1,2,...  The  vectors  have  unit  length,  and  /3k  =  fk)  for 
k  =  2,  3, . . . 

Proof.  The  fact  that  /3k+i  =  7*4.1  follows  directly  from  the  symmetry  of  A. 

For  the  orthogonality,  we  proceed  by  induction.  The  case  of  k  =  1  is  trivial.  Assume 
that  the  conditions  are  valid  for  j  <  k.  We  find  that  /3jt+i(l'fc+ii  vk)  —  {Avk,vk)  —  ak  =  0. 
Since  Avk_\  =  0kvk  +  Qa._iVa._i  +  /3k-iVk-2,  we  find,  using  the  symmetry  of  A  and  the 
induction  hypothesis,  that 

Pk+i[vk+uVk-i)  =  {vk,Avk_i)  -  afc(wi,Uife_i)  -  7*  = 

=  0k  +  ctk-\(vk,Vk-i)  4-/3fc_i(ujt,ujfc_2)  -  7*  =  0k  -  ik  =  0. 
Finally,  for  j  <  k  —  2, 

Pk+i(vk+i,Vj)  =  (vk,Avj)  -ock(vk,Vj)  -7jfc(vfc_i,r;J). 
Since  v3  €  5fc_2,  Av3  G  Sk~x  and  all  three  terms  vanish  by  the  induction  hypothesis. 

Finally,  since  (3k+i(vk+u  i't+i)  =  {Avk,vk+i)  =  Jk+i,  it  follows  that  vk+\  has  unit 
length.  D 


We  can  therefore  write  the  three-term  recurrence  (14)  as 

(15)  /Afc+iUfc+i    =   Avk  -  atkvk  -  PkVk-\- 
Let 

(16)  Vk   ■=   [vuv2,...,vk] 

be  the  n  x  k  matrix  with  the  first  k  Lanczos  vectors  as  columns.  Then  equation  (15) 
gives 

(17)  AVk  =  VkTk  +  pk+1vk+1ejk). 


Here  ejk)  :=  (0 ,0, 1)  and  Tk  is  the  k  x  k  tridiagonal  matrix 


(fc) 


Qi      #2 

f32      02       #3 


fix       O,      A  +  l 

At         Ok 


By  the  orthogonality  of  the  Lanczos  vectors, 
(18) 


vkTAVk  =  rfc) 


i.e.   Tfc  is  a  matrix  representation  of  the  restriction  of  the  operator  ,4  to  the  subspace 
Sk  =  Sk(A,v-[)  using  the  convenient  basis  provided  by  the  Lanczos  vectors. 

2.2.  The  skew-symmetric  case.  We  will  now  consider  the  special  nonsymmetric 
case  (5)-(6)  with  M  positive  definite.  If  the  operator  A  is  skew-symmetric,  the  Lanczos 
vectors  can  be  constructed  quite  similarly  to  the  symmetric  case  just  considered. 

LEMMA  2.2.  Let  A  be  skew-symmetric,  let  v0  :=  0,  let  v-i  be  a  given  nonzero  vector, 
and  let 


19) 


Pk+lVk. 


Avk  -  7fcUjfc_i. 


If -lk   :=   (A^,vfc_1)/(^_i,i^_1),   k   >  2,    then  {vk+1,Vj)   =   0,  j  <  k,   i.e.    vk+ilSk, 
k  =  1,2,... 

Note  that  (3k+i  is  not  yet  defined,  i.e.  the  norm  of  the  vector  vk+\  has  not  been 
chosen.  In  the  derivation  of  the  CGW  algorithm,  cf.  Subsection  3.4,  it  is  convenient 
to  use  a  special  normalization  of  the  vectors  and  we  adopt  this  normalization  from  the 
start: 


(20) 


02  =  1,       &+1+7*  =  l,     k>2. 
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Proof  of  Lemma  2.2.  We  again  use  an  induction  proof.  The  case  of  k  =  1  is  easy,  since 
the  operator  is  skew  symmetric,  i.e.  (v,  Av)  =  0  for  all  v.  The  orthogonality  of  Vk+\  and 
Vk  also  follows  from  this  formula  and  the  induction  hypothesis,  and  that  of  v^+i  and 
Ujt_i  from  the  formula  for  7*.  The  orthogonality  of  Vk+i  and  Vj,  j  <  k  —  2,  is  established 
almost  exactly  as  in  the  proof  of  Lemma  2.1.  □ 

Lemma  2.2  applies  directly  to  the  operator  K  defined  in  (7)  if  the  M-inner  product 
(8)  is  used  throughout.  Thus, 

(21)  -yjfc   :=   {Kvk,vk-1)M/(vk-uvk-1)M  =  (Nvk,Vk-i)/(vk-i,Mvk-t), 

and  the  vectors  defined  by 

(22)  (3k+ivk+1    :=    Kvk-1kVk-i, 
are  A/-orthogonal.  Since  the  system  (1)  can  be  written  as 

(23)  {I-K)x  =  M-1b, 

the  initial  residual  is  M~xb  —  (I  —  K)x0  =  M~lr0  and  the  relevant  Krylov  subspaces 
are  Sk  =  Sk{I  -  A>j)  =  S^A'^'i),  with  Vl  =  M~xr0. 

The  following  results  will  be  used  in  Subsection  3.5. 

LEMMA  2.3.  Let  <f>0  and  4>e  be  odd  and  even  polynomials,  respectively.   Then 
(<f>e(K)v,<f>0{K)v)M  =  0,  for  all  v. 

Proof.  The  operator  <+>t{K)  is  symmetric  with  respect  to  the  M-inner  product.  There- 
fore, 

{<f>e(K)v,  <f>0{K)v)M  =  {v,<f>e(K)<f>0(K)v)M- 

Since  (j)e(K)(f)0(K)  is  an  odd  polynomial  in  A',  it  is  skew-symmetric  and  the  expression 
vanishes.  □ 

Lemma  2.3  directly  implies  the  following 

COROLLARY  2.4.    .S'2fc(A,?'1)  =  Sk(K2,vl)  ©M  Sk{K2J\vi),  where  ®m  denotes  a 
direct  sum  of  M -orthogonal  subspaces. 

Let 

(24)  pk    ■=   {vk,Vk)M- 
From  (22),  it  follows 

Pk+\Pk+i  =  {vk+i,Kvk)M  =  (vk+i,Nvk). 
Similarly  from  (21), 

IkPk-i  =  {Nvk,vk-i)  =  -(vk,Nvk-i). 


Therefore, 

(25)  7*  =  -faPk/pk-i- 

If  the  coefficients  fa  are  chosen  as  in  (20),  they  satisfy  the  recurrence  relation 

(26)  /?2  =  1,      fa+i  =  l  +  fa{pk/pk-i)- 
Let  Vk  be  defined  as  in  (16).  Then  from  (22),  it  follows  that 

(27)  KVk  =  VkJk  +  0fc+i»*+iecik). 


where  Jk  is  the  tridiagonal  matrix 


Jk    :  = 


0      72 

fa      0      73 


ft     0     7t+i 


fa      0  . 
Therefore, 
(28)  VkTAVk  =  \;T(A/  -  N)Vk  =  VkTM(l  -  K)Vk  =  Rk(I  -  J*), 

where  Rk  is  the  diagonal  matrix  with  entries  pi,  p2, .  ■  ■ ,  Pk\  see  (24).  By  using  (25)  and 

(26),  we  see  that  the  parameters  fa  >  0  and  7,  <  0  and  that  the  matrix  Jk  is  similar  to 

1/2 
a  skew-symmetric  tridiagonal  matrix,  via  the  diagonal  similarity  transformation  Rk    . 

3.   The  Basic  Algorithms. 

3.1.  The  Standard  Positive  Definite  Case.  In  this  subsection,  we  derive  the 
standard  conjugate  gradient,  the  minimum  residual  and  the  minimal  error  methods  for 
the  solution  of  (1),  where  A  is  symmetric,  positive  definite.  The  resulting  approximate 
solutions  belong  to  the  fc-dimensional  affine  subspaces  x0  +  Sk(A,rQ)  or  Xo  +  Sk(A,  Ar0). 

The  conjugate  gradient  iterates  xk  minimize  the  ,4-norm  of  the  error.  Thus, 


(29) 


1,     2-  e  x0  +  Sk(A,r0)  . 


Here  r0  =  b  —  Ax0  ,  x*  is  the  solution  of  (1)  and  (a:,y).4  =  xTAy.  The  expression  (29) 
can  also  be  written  as  the  square  of  the  ,4-1-norm  of  the  residual. 

The  minimal  residual  solution  is  determined  by  minimizing  the  Euclidean  norm  of 
the  residual  over  the  same  affine  subspace  while  the  minimal  error  solution  is  determined 
by  minimizing  the  Euclidean  norm  of  the  error  over  a  different  affine  subspace,  namely 


2"o  +  Sk(A,Ar0).    We  first  discuss  the  standard  conjugate  gradient  method  in  some 
detail. 

Since 

(30)  ||x  -  x*\\2A  =  (x,  Ax)  -  2(x,  b)  +  (x*,  6), 

the  minimization  problem  (29)  is,  after  dropping  a  constant  term,  equivalent  to 

min  [(x,  Ax)  -  2(x,  6)],     x  €  x0  +  Sk{A,  r0) 
and  to 

min  [(x  -  T0,A(x  -  x0))  -  2(x  -  x0,r0)],     x  6  x0  +  Sk{A,r0)  . 

By  an  elementary  variational  argument,  it  follows  that  the  same  solution  is  obtained 
from  the  Galerkin  equation 

(31)  (b-Ax,v)  =  0,  for  all  u€  Sk{A,r0),  x  e  x0  +  Sk(A,r0), 
which  can  also  be  written  as 

(32)  (A(x  -  x0),v)  =  {r0,v),  for  all  v  €  Sk(A,r0),  x  €  x0  +  Sk(A,r0). 

We  now  represent  x^  in  terms  of  the  Lanczos  vectors, 

(33)  xck  -  x0  =  Vkyck, 

where  Vk  is  defined  in  (16)  and  yck  :—  (//}    ,  i]2    , .  .  .  ,r;[    )T . 

Substituting  this  expression  into  the  Galerkin  equation  (32),  we  obtain 

(34)  VkTAVkyck  =  VkTr0. 

By  (18)  and  fl\V\  =  7-o,  this  is  equivalent  to  the  tridiagonal  system 

(35)  Tkyck  =  /?ie(1), 
where  e(1)  :=  (1,  0. . .  .  ,0)T. 


The  residual 


rk    :=    b-Ax\ 


is  a  multiple  of  the  Lanczos  vector  v^+i  and  the  residuals  are  therefore  orthogonal.  To 
see  this,  we  use  (17)  and  obtain 

AVky\  =  VkTky\  +  f3k+lvk+lejk)yck. 


By  using  that  /3iVi  =  r0,  (33)  and  (35),  we  find  that 

(36)  rk  =  -ft   Pk+iVk+i- 
Thus,  by  Lemma  2.1, 

(37)  rklSk(A,r0). 

Orthogonality  also  follows  immediately  from  the  Galerkin  condition  (31).  If  (3k+i  =  0, 
then  Ax\  =  b. 

Following  Paige  and  Saunders  [31],  we  now  provide  further  algorithmic  details. 
Cholesky's  method  can  be  used  to  solve  (35).  The  coefficient  matrix  is  factored 

(38)  Tk   =   EkDkEj, 

where  Dk  is  a  diagonal  matrix  with  elements  6i,  <52,  •  •  ■ ,  6k,  and  Ek  is  a  unit  lower 
bidiagonal  matrix  with  off-diagonal  elements  A2,  A3, . . . ,  A^.  These  elements  are  obtained 
by  using  the  recursions 

(39)  6x=au     \J  =  f3J/6J_u     6j  =  aj  -  ftXj,      j  =  2,...,k. 

Here  the  o^.  and  /3k  are  the  coefficients  of  the  three-term  recurrence  (15).  The  matrices 
Ek_\,  Dk_i  are  the  (k  —  1)  x  (k—l)  principal  minors  of  the  matrices  Ek,  Dk,  respectively. 
In  step  k  of  the  algorithm,  the  new  elements  A^,  6k  are  computed  using  only  the  values 
q>,  /3k  and  6k-\. 

Similarly,  the  solution  of  the  lower  triangular  system,  cf.  (35), 

(40)  EkDksk  =  0ie(1), 

sk  =  (^i, . . .  ,£k)T  has  the  same  k  —  1  first  components  as  sk-i-  The  component  £k  is 
obtained  by  the  formula 

(41)  £*  =  -6k-ihtk-i/Sk, 
with  £i  =  /?i/oi.  On  the  other  hand,  the  vector 

(42)  yl  =  EkTsk, 

changes  completely  from  one  step  to  the  next.  It  is  therefore  more  convenient  to  work 
with  an  additional,  /1-orthogonal,  basis  for  Sk.  As  in  (33)  let 

(43)  xck-x0  =  Pksk, 

where 

(44)  ft  =  [pi,Pa,...,Pfc]   :=   VkEkT, 

li 


(45)  xck  =  x0  +  ]T  £iPi  =  xck_x  +  £kPk. 

1=1 

It  follows  from  equation  (44)  that  the  vectors  p3  can  be  computed  directly  from  the  vj 
by  the  recursion 

(46)  Pi=Ui,     Pj ■  =  Vj  -  \jPj-i,     j>2. 
By  using  (37),  we  find  that 

(47)  (rjfc,pi)  =  0,     j<*. 

Equations  (45)  and  (46)  show  that  the  approximate  solution  xk  can  be  computed  directly 
from  the  previous  iterate  xck_x,  and  that  the  Lanczos  vectors  need  not  be  saved. 

In  summary,  a  step  of  the  conjugate  gradient  method  proceeds  by  computing  new 
values  of  a  and  0  using  the  formulas  in  Lemma  2.1,  a  new  Lanczos  vector  using  (15), 
new  values  of  S  and  A  using  (39),  a  new  component  £  of  5  using  (41),  a  new  vector  p  using 
(46)  and  finally  a  new  approximation  to  the  solution  using  (45).  In  this  implementation 
only  five  vectors  of  storage  are  necessary.  The  matrix  A  is  only  needed  in  terms  of  a 
subroutine  which  produces  the  matrix-vector  product  and,  as  previously  pointed  out, 
this  makes  the  algorithm  quite  independent  of  the  choice  of  data  structures,  etc. 

The  following  computation  shows  that  the  vectors  p,  are  conjugate,  i.e.  A-orthogonal, 

PjAPk  =  Ek'VkT  AVkE-kT  =  E-k'TkEk-T  =  Dk. 

The  vectors  p,  are  often  called  search  directions,  since  by  (45),  we  move  in  the  direction 
of  pk  when  we  go  from  xk_1  to  xck.  By  using  the  conjugacy  of  the  search  directions,  it 
is  easy  to  show  that  xk  can  be  obtained  by  solving 

||*I -**|fi  =  nun  ||(*S_i  +  A«PO-*li- 

By  setting  the  derivative  with  respect  to  //  equal  to  zero,  we  obtain  an  alternative 
formula  for  £* 

(48)  tk  =  {pk,rk-1)/(pk,APk). 

From  (45)  it  also  follows  that 

4  -  xk-\  =  ikPk- 

By  using  formulas  (36)  and  (46),  we  obtain 

S+i(*»+i  -  *et)  =  -(rfWi)-1r*  -  £*%+!(**  -  *Li), 
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and  thus 

**+i  =  4  -  6k+i(tf^*fi)_l(r*  +  ^A^r^/W^  -  4-i))- 

It  follows  from  (42)  that  t/^'  =  &.  From  (41)  and  (39)  we  then  obtain  the  form  of  the 
algorithm,  which  is  associated  with  Rutishauser  [35], 

(49)  xtn  =  xek  +  (1/&+00"*  +  \k+1Pk+i(xl  ~  4-i ))• 

We  now  turn  to  the  minimal  residual  algorithm.  As  we  have  previously  noted,  this 
solution  is  defined  by  minimizing  the  Euclidean  norm  of  the  residual  over  the  same 
affine  subspace  that  is  used  for  the  standard  conjugate  gradient  method.  By  quite 
similar  arguments,  we  obtain  a  five  diagonal  linear  system  of  equations, 

(50)  V?A2Vkyl  =  VkTAr0. 

By  further  straightforward  computations  using  (17),  we  can  rewrite  this  equation  as 

(51)  {T2k  +  0l+ieMeTk))yk  =  «ifre<i)  +  ftftep). 

We  will  return  to  a  more  detailed  discussion  of  equation  (51)  in  the  next  subsection. 

In  a  similar  way,  we  can  derive  a  pentadiagonal  linear  system  for  the  minimal  error 
solution.  This  solution  has  the  form 

(52)  xl  =  x0  +  AVkyek. 

The  resulting  linear  system  has  the  same  coefficient  matrix  as  (51),  but  a  different  right 
hand  side.  The  system  has  the  form 

VkTA2Vkyl  =  VkTrQ, 


(53)  (T2k  +  Pi+1e{k)e{k))yl  =  (3^. 

These  pentadiagonal  linear  systems  of  equations,  which  are  nonsingular  if  A  is 
nonsingular,  can  also  be  solved  by  Cholesky's  method.  A  typical  row  of  the  resulting 
triangular  matrices  has  two  nonzero  off-diagonal  elements  and  the  search  directions  can 
be  computed  recursively  from  the  two  previous  search  direction  vectors  and  one  Lanczos 
vector.  We  will  return  to  a  study  of  the  minimal  residual  and  minimal  error  solutions 
since  they  can  be  computed  in  a  stable  manner  for  any  problem  with  an  arbitrary 
nonsingular,  symmetric  but  not  necessarily  positive  definite,  coefficient  matrix. 


i:; 


3.2.  The  Indefinite  Symmetric  Case.  In  this  and  the  next  subsection,  we  con- 
sider the  case  where  A  is  symmetric,  nonsingular  but  indefinite.  The  Lanczos  vectors 
and  the  Krylov  spaces  are  defined  exactly  as  in  the  positive  definite  case.  The  quadratic 
form  (30)  no  longer  has  a  minimum,  but  we  can  use  the  Galerkin  condition  (31)  to  try 
to  find  a  unique  saddle  point.  As  in  the  previous  subsection,  we  then  obtain  the  tridiag- 
onal system  of  equations  (35).  The  coefficient  matrix  is  the  restriction  of  an  indefinite 
operator  to  a  subspace  and  the  tridiagonal  matrix  can  therefore  be  singular.  If  it  is 
almost  singular,  the  solution  of  equation  (35)  can  have  a  very  large  norm  and  catas- 
trophic cancellation  then  makes  the  procedure,  as  developed  for  the  positive  definite 
case,  numerically  unstable.  On  the  other  hand,  an  elementary  argument  shows  that  if 
the  tridiagonal  matrix  Tk  is  singular,  then  Tk+i  is  not,  unless  the  off-diagonal  element 
/3k+1  =  0.  By  Lemma  2.1  and  (36),  this  happens  only  if  the  residual  r^  =  0.  Because  of 
these  difficulties,  it  is  natural  to  consider  an  alternative  algorithm  of  solving  equation 
(35). 

We  follow  Paige  and  Saunders  [31]  and  use  an  LQ  factorization.  Other  possibili- 
ties have  been  explored  in  the  literature.  Thus  Chandra  [4]  developed  an  alternative 
algorithm,  SYMMBK,  in  which  the  tridiagonal  matrix  Tk  is  factored  using  both  lxl 
and  2x2  pivots,  as  in  the  algorithm  by  Bunch  and  Kaufman  [3].  Simon  [38],  on  the 
other  hand,  proposed  that  the  tridiagonal  matrix  and  the  norm  of  the  residuals  be 
computed  first,  without  generating  the  iterates,  and  that  when  the  norm  of  the  residual 
is  sufficiently  small,  the  linear  system  (35)  be  solved  by  a  stable  method.  Finally,  the 
solution  is  assembled  by  retrieving  the  Lanczos  vectors  from  secondary  storage  or  by 
regenerating  them;  cf.  Parlett  [32].  Both,  in  the  SYMMBK  algorithm  and  in  Simon's 
method,  only  the  standard  conjugate  gradient  approximation  xk  is  produced,  while  the 
SYMMLQ  algorithm  of  Paige  and  Saunders  also  provides  an  additional  sequence  of 
approximate  solutions  which  we  further  characterize. 

Early  on,  Fridman  [16]  proposed  that  the  minimum  error  approximations  to  the 
solution  of  (1)  in  the  subspace  Sk(A,Ar0)  be  used.  As  we  will  see,  the  choice  of  this 
initial  element  of  the  Krylov  space  is  crucial.  Let  u,-,  i  =  1,2...,  be  the  Lanczos  vec- 
tors spanning  Sk(A,  Ar0).  Fridman's  main  observation  was  that  the  minimum  error 
approximation  can  be  written  as 


xo  +  ^^jVj  ,  where  <pj  :=  (a;*  -  x0,Vj). 


By  the  orthogonality  of  the  Lanczos  vectors,  it  is  also  true  that  ifk  =  (x*  —  x%_1,Vk] 
By  using  (15),  we  find  that 

tpk  =  (x*  -  x\_x,Avk-x  -  aic-iVk-i  -  Pk-iVk-7.)/Pk  ■ 

Since  x*  —  x%_^  is  orthogonal  to  Sk~l{A,  Ar0),  we  find  that  for  k  >  1 

iPk  =  {A{x* - xl_i),vk-\)l fa  =  (nt-i,Vfc_i)//?*  • 
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Similarly,  since  the  initial  element  of  the  Krylov  sequence  is  Ar0, 

Vi  =  {x*  -  Xo.t'i)  =  (A(x*  -  x0),ro)/||v4r0||  =  (r0,  r0)/||/tr(1||. 

Fridman's  method  is  numerically  unstable;  see  Paige  and  Saunders  [31]  or  Stoer 
and  Freund  [40].  In  the  next  subsection,  we  show  that  the  additional  sequence  of  ap- 
proximate solutions  of  the  SYMMLQ  algorithm  provides  a  stable  computation  of  the 
minimum  error  solution. 

Adopting  the  notations  of  Paige  and  Saunders  [31],  we  factor  Tk 

(54)  Tk  =  LkQk, 

where  Qk  is  orthogonal  and  Lk  is  lower  triangular.  To  fully  understand  the  derivation 
of  the  SYMMLQ  algorithm,  we  need  to  examine  Lk  and  Qk  in  some  detail.  The  LQ 
factorization  is  computed  recursively.  The  orthogonal  matrix  Qk  is  the  product  of  k  —  1 
elementary  orthogonal  matrices  Qk-i,k  ' ' '  Q\,2-  In  the  first  step  an  elementary  sym- 
metric, orthogonal  matrix  Qi.2  is  used  to  eliminate  the  (1,2)  element  of  the  tridiagonal 
matrix  Tk.  This  orthogonal  matrix  is  block  diagonal  with  all  diagonal  elements  equal 
to  one,  except  for  a  leading  2x2  diagonal  block  of  the  form 


(55) 


c      s 
s     —c 


with  c2  +  s2  =  1.  It  is  easy  to  see  that  if  Tk  is  multiplied  from  the  right  by  Qi.2 
changes  occur  only  in  the  two  first  columns  and  that  a  nonzero  element  is  introduced 
in  the  (3,1)  position.  In  the  second  step  (and  in  the  steps  that  follow),  the  elements 
of  the  first  row  and  first  column  are  not  changed,  while  the  elements  in  the  three  first 
rows  and  two  first  columns  of  what  remains  of  the  matrix  are  changed  by  multiplying 
by  an  elementary  orthogonal  matrix  $2,3^  constructed  similarly  to  Qi,2-  Note  that  this 
(k  —  1)  x  (k  —  1)  matrix  involved  in  the  second  step  is  initially  tridiagonal.  A  similarly 
situated  element  below  the  diagonal  becomes  nonzero  but  there  is  no  further  loss  of 
sparsity.  In  the  second  step,  which  only  affects  the  second  and  third  column,  the  (2,2) 
element  is  changed  a  second  and  final  time.  By  the  same  arguments,  the  matrix  Lk 
differs  from  the  k  x  k  leading  minor  of  Lk+\  only  in  the  (k,  k)  element.  After  k  —  1  steps, 
the  lower  triangular  matrix  Lk  is  obtained.  It  has  nonzero  elements  in  positions  at  a 
distance  up  to  two  from  the  diagonal.  This  matrix  is  singular  if  and  only  the  matrix 
Tk  is  singular.  By  simple  matrix  algebra,  it  is  also  easy  to  see  that  Lk  is  a,  possibly 
singular,  Cholesky  factor  of  TJ*. 

As  in  the  positive  definite  case,  we  can  in  principle  obtain  the  conjugate  gradient 
solution,  for  those  k  for  which  it  exists,  by  setting 

(5G)  x\  -x0=  Wkzk. 

Here  zk  is  the  solution  of 

(57)  Lkzk  =  Ae(1), 
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and 

(58)  Wk  =  [wuw2,...,wk]   :=    VkQl; 

cf.  (40),  (43)  and  (44).  By  using  the  product  form  of  Qk  given  above,  we  see  that 
only  the  kth  column  of  Wk  changes  when  we  update  it  to  produce  Wk+1.  Since  the  xk 
iterates  do  not  always  exist  or  can  be  quite  susceptible  to  roundoff,  Paige  and  Saunders 
work  with  a  different  sequence  xk  from  which  xck  can  be  computed  upon  demand.  The 
sequence  xk  is  obtained  by  replacing  Lk  by  Lk,  the  k  x  k  leading  minor  of  Lk+i  and  Wk 
by  Wk,  the  first  k  columns  of  Wk+i.  Thus 

(59)  xLk  -  x0  =  Wkzk, 
where  zk  is  the  solution  of 

(60)  Lkzk  =  ^e{1). 

The  first  k  —  1  components  of  zk  coincide  with  zk_\  since  all  but  the  first  component  of 
the  right  hand  side  is  zero.  In  the  update  step  the  kth  row  of  Lk  is  calculated  to  obtain 
0t,  the  new  element  of  zk.  A  new  approximation  xk  is  given  by  the  formula 

(61)  xk  =  x^_j  +  (kwk; 

cf.  (45).  By  using  the  fact  that  only  the  last  rows  of  Lk  and  Lk  differ  and  (56)-(61),  it 
is  easy  to  see  that  the  conjugate  gradient  approximation  satisfies 

(62)  xck+i  =  xk  +(k+iwk+i. 
where  (k+i  is  the  last  component  of  Zk+i- 

The  SYMMLQ  algorithm  returns  either  xk  or  xck,  depending  on  which  one  has  the 
smallest  residual  norm,  once  a  prescribed  tolerance  has  been  met.  The  residual  norms 
are  computed  at  a  cost  of  a  few  scalar  operations  from  quantities  already  available;  see 
Paige  and  Saunders  [31]. 

In  the  same  paper,  Paige  and  Saunders  also  discuss  MINRES,  the  minimum  residual 
algorithm.  The  Cholesky  factorization  of  the  matrix  in  equation  (51)  is 

(63)  Tl  +  P2k+1e(k)eJk)  =  lkL[  +  (32k+1e(k)eJk)  =  LkLTk. 

This  Cholesky  factor  is  the  same  matrix  as  in  (60).  This  follows  from  the  fact  that  Lk 
is  the  k  x  k  leading  minor  of  Ljt+i. 

The  vector  yrk,  defined  by  equation  (51),  changes  completely  from  one  step  to  the 
next.  Therefore,  in  a  way  similar  to  the  conjugate  gradient  solution,  cf.  (43)-(45),  let 

(64)  xrk-x0  =  Mktkl 
where 

(65)  Mk  =  [m\,mi mk)   :=   VkLkT 
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and  tk  =  (n, . . . ,  rk)T  is  the  solution  of 


Lktk  =  a^ief,)  +  /31/32e(2). 

The  first  fc-  1  components  of  *fc  coincide  with  tk_x  since  all  but  the  first  two  components 
of  the  right  hand  side  are  zero.  We  obtain, 

xl  =  xl-i  +  r*m*- 

Since  the  last  row  of  Lk  has  nonzeros  only  in  the  last  three  positions,  the  search  direction 
mk  can  be  computed  recursively  from  the  Lanczos  vector  t;*,  mjt_i  and  mk_2- 

We  also  show  how  to  obtain  the  conjugate  gradient  solution  from  the  minimal 
residual  solution.  From  (63),  it  follows  that  Lk  and  Lk  differ  only  in  the  (k,k)  position. 
Let  Ck  be  the  diagonal  matrix  with  entries  l,...,l,cjt  such  that  Lk  =  LkCk.  The 
quantity  ck  appears  on  the  diagonal  of  the  2x2  matrix  (55)  of  Qk-i,k- 

Premultiplying  (34)  by  Tk  =  VkTAVk,  we  see  that  yck  satisfies 


while  yTk  satisfies 


and  tk  =  Llyk.  Thus, 


Tlyl  =  LkClLTky{  =  VkT  Ar0, 


LkLTkyTk  =  VkTAr0, 


i2  tT„.c 


CiL[yl  =  tk. 


We  rewrite  (33)  using  (65)  as 

xt-xo  =  VkLkTLTkyi  =  MkCk2tk. 
From  (64)  and  the  form  of  Cjt,  we  finally  obtain 

x\  =  xrk  +  Mfc(Cfc-2  -  I)tk  =  xrk  +  Tk(c~k2  -  l)mjfc. 

3.3.  Variational  Formulation  of  the  SYMMLQ  Algorithm.  Paige  and  Saun- 
ders introduced  the  second  sequence  of  iterates  x\  primarily  to  obtain  a  stable  algorithm. 
They  also  noted  that  xk  is  a  better  approximation  than  xk  of  x*  in  the  /2-norm.  This 
follows  from  (62)  since  from  (58),  Wkwk+i  =  0.  Fletcher  [12]  rediscovered  Fridman's 
method  independently,  while  considering  a  class  of  biconjugate  gradient  methods,  and 
showed  that  the  resulting  iterates  are  the  same  as  xk.  Since  one  of  the  iterative  se- 
quences can  be  characterized  variationally,  it  is  possible  to  derive  an  error  bound;  cf. 
Subsection  4.2. 

THEOREM  3.1.  The  sequence  xk  of  the  SYMMLQ  algorithm  is  the  minimum  error 
approximation  in  the  subspace  Sk(A,  Ar0). 
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Proof.  By  using  equations  (53),  (52),  (63),  (59)  and  (60),  we  see  that  the  theorem 
follows  by  showing  that 

(66)  Wk  =  AVkLf. 

The  matrix  Wk  consists  of  the  first  k  columns  of  M^+i  =  Vk+\Ql+1.  Postmultiplying 
the  m  x  (k  -f  1)  matrix  Wk+i  by  LJ+1,  we  obtain,  cf.  (54), 

14+iTjfe+a  =  AVk+1. 

The  first  k  columns  of  AVk+i  coincide  with  AVk.  The  matrix  LkT  is  upper  triangular 
and  is  the  k  x  k  principal  minor  of  Lk+V  Being  upper  triangular,  the  last  row  of  Lk+1 
has  a  nonzero  only  in  the  (k  +  1,  k-\- 1)  position.  Therefore,  AVkLkT  consists  of  the  first 
k  columns  of  AVk+iLk~+1  and  (66)  is  established.  □ 

3.4.  The  Special  Nonsymmetric  Case.  If  the  matrix  A  in  (1)  is  skew-symmetric, 
the  Lanczos  vectors,  which  span  the  Krylov  subspaces  are  constructed  as  in  Lemma  2.2. 
Similarly,  as  we  have  seen  in  Sections  1  and  2.2,  if  the  matrix  A  is  nonsymmetric  but 
can  be  split  as  in  (5)-(6)  with  M  positive  definite,  then  the  matrix  K  =  M~lN  is 
skew-symmetric  with  respect  to  the  M-inner  product  and  the  relevant  Krylov  space  is 
Sk(K, vi),  where 

(67)  Vl  =  Af-Vo  =  M~xb  -  (I  -  K)x0. 

In  this  case,  the  Galerkin  procedure  works  and  gives  invertible  linear  systems  of 
equations  and  approximations  for  which  error  bounds  can  be  derived.  We  first  describe 
the  method  developed  by  Concus,  Golub  [5]  and  Widlund  [46],  which  is  commonly 
known  as  the  CGW  method.  The  Galerkin  condition  has  the  form  (32)  and,  as  in  the 
symmetric,  positive  definite  case,  it  is  satisfied  by  xk  —  x0  =  Vkyk,  where  y9k  is  the 
solution  of 


(68) 


VkTAVkyi  =  VkTr0. 


From  (24),  (28)  and  (67),  it  follows  that  (68)  is  equivalent  to 

(69)  Rk(I-Jk)y9k  =  PieW. 

We  note  that  pt  =  (uj,  Mv{)  is  the  first  element  of  the  diagonal  matrix  Rk  and  that  we 
therefore  can  simplify  equation  (69)  to 

(70)  (I-Jk)y9k  =  e(1). 

With  the  normalization  given  by  (20),  the  LU  factorization  of  (/  —  Jk)  =  LkUk 
requires  no  arithmetic;  this  is  the  reason  for  chosing  (3k  in  this  way.  The  factors  are 

i  i  r& 


i 
-i 


i 


i   i 


-i  i 


uk 


-72 

03         -73 


A 


>*+i 
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It  is  also  easy  to  see  that 


LkC(k)  —  e(i), 
where  eTfc>    :=    (1,1,. . . , 1)   .  The  triangular  system 

UkVl  =  Hk) 

remains  to  be  solved.  We  wish  to  obtain  y9  =  Ukle.(k\  directly  from  y9k_1.  Using  block 
Gaussian  elimination,  we  find  that  since 


uk  = 


' 

0 

Uk.i 

0 

-Ik 

0 

flfe+1  . 

its  inverse  is 

u?  = 

ukl, 

(lk/lh+i)uk-i 

0 

1/ft+i 

where  uk-i  :  =  U^e^-i),  ((k-i)  :=  (0, ...  ,0, 1  )T.  Thus, 


yl  =  u^e{k)  =  u-k 

where  uk  — 


e(fc-i) 


f/fcJie(fc_i)  +  (7/.-M-+1  W-i 
1/ft+i 


Vk-i 
0 


+  uk, 


{lk/Pk+i)uk--i 

1/Afc+i 

from  y9k_J  and  uk_i.  The  Galerkin  solution  is  then 


Uk  Je(fc).  We  can  therefore  obtain  y9k  and  Uk  directly 


(71) 


4  -  ^o  =  VkVk  =  VkU'k  le{k)  =  K-i?/Li  +  VkUk. 


The  first  term  on  the  right  in  (71)  is  x3k_1  —  xQ  =  Vk-\yk_i  while  the  second  is  Ax9k 
x9k  —  xk-i  =  ^4ufc-  We  can  therefore  write  (71)  as 


(72) 

Since  by  (20),  <yk  =     -  pk+1, 


A*fc  =   ~n Vi_ltlfc_l    +   ~ Vk   =   ~ AxX_!   +  "= Ujfc. 

Pfc+1  Pfc  +  1  P/t  +  1  PJt+1 


7it 


1 


A+i       pVh 


-  1. 


Only  the  parameter 

(73) 


w*    :=    l/fik+i, 
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is  needed  and  we  rewrite  (72)  as 

Aif.  =  (uk  -  l)Aif_,  +  ukvk, 
or  equivalently 

(74)  x9k  =  a;f_2  +  uk{vk  +  4_i  -  4_2). 

LEMMA  3.2.  Lei  t^  6e  £/ie  Lanczos  vectors  constructed  as  in  Lemma  2.2  with  the 
initial  vector  given  by  (67),  and  the  normalization  (20).  Let  the  residuals  of  the  CGW 
method  be  rk  :=  b  —  Axgk.   Then  vk+i  =  M~xrk,  k  =  0,  • . . 

Proof.  We  use  induction.  From  (67)  the  assertion  holds  for  k  =  0.  Let  x_i  =  0.  From 

(74), 

rk  =  b-  Ax{  =  rk_2  -  uk{Avk  -  rk_^  +  rk_2). 
Premultiplving  by  M~l,  we  obtain 

M~lrk  =  M~lrk_2  -  wk{{l  -  K)vk  -  M~lrk^  +  M~lrk_2), 
which  by  the  induction  hypothesis  is 

M~xrk    =    ujt_i  -  wjt((7  -  K)vk  -  vk  +  vk^ )  = 

(75)  =    ukKvk  +  (1  -  uk)vk_1. 

The  lemma  follows  from  (73)  and  (20),  and  by  comparing  (75)  with  (22).  □ 

A  consequence  of  Lemma  3.2  is  that  pk  =  (vk,rk_i);  cf.  (24).  Also  note  that  from 
(26),  l/u-'/t  =  1  +  (pk/pk-\  )/u,'fc_i ,  k  >  1  and  this  is  how  u>k  is  actually  computed. 

We  conclude  the  subsection  with  a  description  of  the  algorithm  developed  by 
Rapoport  [33]  in  his  197S  dissertation,  which  is  the  minimal  residual  analog  to  the 
CGW  algorithm.  The  algorithm  is  derived,  straightforwardly,  by  minimizing  the  M_1- 
norm  of  the  residual  of  the  equation  (1)  over  the  same  affine  subspace.  Just  as  in 
the  symmetric  case,  cf.  (50)-(51),  a  pentadiagonal  linear  system  results.  As  noted 
by  Rapoport,  all  elements  next  to  the  diagonal  vanish.  However,  it  does  not  seem  to 
be  possible  to  realize  any  direct  savings  by  using  this  structure.  We  cannot  see  any 
convincing  reason  to  prefer  this  method  over  the  CGW  algorithm. 

3.5.  Variational  Formulation  of  the  CGW  Algorithm.  We  conclude  this  sec- 
tion by  showing  that  the  approximations  produced  by  the  CGW  algorithm  with  even 
and  odd  indices,  {x^}  and  {x%2k+i},  also  can  be  generated  by  applying  the  standard 
conjugate  gradient  method  to  the  system  obtained  after  using  the  second  Gauss  trans- 
form and  the  initial  guesses  xo  and  a;^,  respectively.  This  result  is  closely  related  to 
Corollary  2.4  and  greatly  simplifies  the  error  analysis,  given  in  the  next  section,  since 
the  theory  for  the  standard  symmetric,  positive  definite  case  applies. 
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It  is  disappointing  that  the  CGW  algorithm  offers  so  little  new  and  that  the  error 
bound  depends  on  the  spectrum  of  /  -  K 2.  However,  as  in  the  SYMMLQ  algorithm,  the 
CGW  algorithm  provides  two  subsequences,  the  odd  and  even.  There  are  cases  when 
one  of  them  converges  appreciably  faster  than  the  other  and  in  such  cases  the  CGW 
algorithm  can  be  faster  than  if  only  one  of  the  subsequences  were  used. 

The  result  can  be  derived  from  a  result  by  David  Young  [47],  who  appears  to  have 
been  the  first  to  observe  a  relationship  between  the  CGW  algorithm  and  a  standard 
algorithm  using  the  operator  /  —  A'2.  His  result  is  presented  in  a  context  different  than 
ours  in  Young,  Hageman  and  Luk  [20].  Further  results  are  given  in  a  paper  by  Eisenstat 
[8].  A  direct  connection  with  the  symmetric  system  obtained  by  using  the  second  Gauss 
transform  was  not  established  in  these  papers. 

The  use  of  the  second  Gauss  transform  amounts  to  a  change  of  variables 

(76)  x  =  (I  +  K)z. 
The  system  (23)  becomes 

(77)  (/  -  K2)z  =  M~xb  . 

When  the  standard  conjugate  gradient  method,  with  the  M-inner  product,  is  applied 
to  (77)  the  relevant  Krylov  subspace  is  Sk(I  -  K2,V\)  =  Sk(K2,v1),  where  v1  is  the 
initial  residual  given  bv 

(78)  ui  =  M~'b  -  (I  -  K2)z0  =  M~lb  -  (/  -  A>0  =  M'xr0. 

We  also  note  that  by  a  direct  calculation  using  (74),  with  u>\  :=  1  and  x_\  :=  0,  the 
approximation  after  the  first  step  of  the  CGW  algorithm  is  given  by 

(79)  X™  =  X0+V1. 

By  Lemma  3.2,  the  residual  after  the  first  step  and  the  second  Lanczos  vector  are  given 
by 

(80)  r-i  —  MI\'r0    and    v2  =  Kih- 

The  main  result  is  formulated  as  follows. 

THEOREM  3.3.  Let  {x™},  k  =  0. 1, . . .,  be  the  CGW  approximations  to  x*  for  the 
special  nonsymmetric  problem  (5)-(6).  Let  z0  and  Z\  be  defined  by  x0  =  (I  +  A')zo  and 
X™  —  {I  -f-  K)z\  and  consider  the  system  obtained  after  the  second  Gauss  transform 

(81)  (/  -  K2)z  =  M~lb  . 

Let  Zk  be  the  conjugate  gradient  solution  of  (81)  using  the  initial  guess  zq.  Then  the 
approximations  with  even  indices,  x^,  A-  =  0,1,.. .,  coincide  with  (I  +  L\)zk-    The  odd 
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subsequence  can  similarly  be  obtained  by  the  conjugate  gradient  solution  of  the  same 
system  using  the  initial  guess  z\. 

Proof.  By  the  Galerkin  condition  (32),  the  solution  x^  is  the  unique  element  in  x0  + 
S2k(K,Vi)  with  a  residual  M~lb  -  {I  -  A')x^fc,  which  is  M-orthogonal  to  S2k(K,Vi). 
Similarly,  z*  is  the  unique  element  in  zo  +  S  (K2,Vi)  such  that  the  residual 

nt  :=  vi  -  (I  -  K2){zk  -  z0) 

is  M-orthogonal  to  Sk(K2,v1).  Since  (/  +  K)(zk  -  z0)  €  (/  +  K)Sk{K2,v1)  and 
(I  +  K)Sk(K2,vi)  C  S2k(K,  t>i),  we  can  by  Corollary  2.4  complete  the  proof  of  the 
theorem  for  the  even  case  by  proving  that  rk  is  M-orthogonal  to  all  of  S2k(K, Vi). 
Since  rk  E  5fc+1(A'2,i;i),  this  follows  directly  from  Lemma  2.3.  The  result  for  the  odd 
subsequence  is  obtained  in  almost  exactly  the  same  way,  by  using  the  formulas  given  in 
(79)-(80).  □ 

4.  Error  Bounds.  It  is  easy  to  see  that  any  element  of  the  Krylov  space  Sk(A,  v\) 
can  be  represented  as  <j>k-i{A)vi,  where  <j>k-i  E  Vk-\i  the  space  of  polynomials  of  degree 
k  —  1.  The  errors  of  the  conjugate  gradient  methods  considered  in  this  paper  can  be 
estimated  from  above  in  terms  of  the  solution  of  certain  best  polynomial  approximation 
problems.  This  is  well  known  for  the  standard  conjugate  gradient  method;  cf.  Daniel 
[6]  and  Luenberger  [28].  As  we  will  see,  the  same  estimate  can  also  be  used  for  several 
other  cases,  but  for  others,  more  complicated  polynomial  approximation  problems  arise. 
These  problems  are  also  quite  well  understood,  primarily  because  of  the  efforts  of  Freund 
[14]  and  Freund  and  Ruscheweyh  [15]. 

Throughout,  x*  denotes  the  solution  of  (1).  Two  affine  polynomial  spaces  of  di- 
mension k  will  be  used.  They  are 

V°k_i  :=l+tf>fc-i, 

in  other  words  the  polynomials  of  degree  k  that  are  equal  to  1  at  the  origin,  and 

PL,  :=  i  +  '2^-i . 

or  the  polynomials  of  degree  k  -f  1  that  are  equal  to  1  at  the  origin  and  have  zero  first 
derivative  at  the  same  point.  It  is  easy  to  see  that  the  natural  bases  of  tVk-\  and 
t2Vk-\  form  so-called  Haar  systems;  cf.  Meinardus  [29].  The  existence  and  uniqueness 
of  the  solution  of  the  approximation  problems  that  we  are  going  to  formulate  follows 
immediately. 

4.1.  The  Standard  Estimate.  We  begin  by  reviewing  the  standard  case.  The 
conjugate  gradient  solution  xck  of  (1)  is  the  minimizing  element  for  (29).  Thus,  the  error 
ek  :—  xck  —  x*  satisfies 

(82)  eTkAek=       min     (cf>k_1(A)r{i  +  e0)T  A(<pk-i{A)r0  +  e0). 

^(c-ieTV-i 
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Since  r0  =  —  Ae0,  (82)  can  be  rewritten  as 

e[A(k=       min     {tk_l{A)e0)T  A(xJ'k-i(A)t  „). 

By  expanding  the  initial  error  eo  in  eigenvectors  of  A,  it  is  easy  to  show,  that 

(83)  -nr-A —  <       min        max   xblAo  t) , 

where  <t(j4)  is  the  spectrum  of  A. 

We  can  now  obtain  an  upper  bound  of  (83)  for  the  symmetric  positive  definite  case; 
see  Daniel  [6]  and  Luenberger  [28]. 

THEOREM  4.1.  Let  {xck},  k  =  0,1,...,  6c  the  conjugate  gradient  approximations 
to  x* ,  and  let  A  =  AT  be  positive  definite.  Let  0  <  a  :=  m'ma(A),  /?  :=  maxcr(j4)  , 
K  :=  k(A)  =  the  condition  number  of  A  =  (3 /a,  and  p  =  (\/k  +  1  )/(>/*  —  1)  .  Then, 
the  error  ek  :=  x\  —  x*  satisfies 

(84)  I41  < 


eo|U       ?   +  P~ 
Proof.  From  (83),  it  follows  that 

- — —  <       mm       max    t,ff     . 

IMU  ^-,€r».,^[a,/J] 

This  approximation  problem  has  a  known  unique  solution,   namely,  the  normalized 
Chebyshev  polynomial  given  by 

TK(~2C+0+a) 
(85)  0*-i(C)  = 


ftCfSl 


where  Tk{n)  =  cos(k  arccosi])  is  the  k"1  Chebyshev  polynomial.   By  using  a  standard 
formula,  we  obtain  the  bound  (84).  □ 


By  dropping  a  term  in  the  denominator,  we  obtain  a  standard  form  of  the  bound 

<2(- 


klU  ^  o/v/k-  l,k 


e0    .4 


v^+1 


These  upper  bounds  depend  only  on  the  condition  number  k.  If  more  information 
on  the  distribution  of  the  eigenvalues  of  A  is  available,  better  bounds  can  be  obtained; 
see  Axelson  [2],  Greenbaum  [18],  Hayes  [22],  Lebedev  [27],  Saad  [36],  van  der  Vorst 
[44]  and  Widlund  [46].  See  the  discussion  below  of  the  symmetric  indefinite  case  for  a 
sample  argument  of  this  kind. 

The  analysis  of  the  algorithms  based  on  the  straightforward  use  of  the  first  and 
second  Gauss  transforms  is  very  similar  to  the  standard  case.     A  direct  calculation 
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shows  that  the  resulting  bounds,  which  compare  the  l%—  norm  of  the  residual  and  the 
error,  respectively,  after  k  steps  with  the  corresponding  initial  error  measure,  reduces 
to  the  standard  case.  The  same  bounds  based  on  Chebyshev  polynomials  result,  except 
that  the  relevant  spectrum  is  that  of  ATA. 

By  Theorem  3.3,  the  CGW  algorithm  provides  two  subsequences  of  iterates,  each 
of  which  can  be  derived  via  the  second  Gauss  transform.  If  A  is  the  spectral  radius  of 
K,  i.e.  o~{K)  C  [— iA,iA],  the  condition  number  of  the  matrix  in  (81)  is  bounded  by 
1  +  A2.  In  practical  applications,  either  0  €  c(A'),  or  there  is  an  eigenvalue  very  close 
to  zero.  Thus,  we  can  assume  that  the  condition  number  of  I  —  K2  is  equal  to  1  4-  A2. 

THEOREM  4.2.  Let  {xgk},  k  =  0,1, . . .,  be  the  CGW  approximations  to  x*  for  the 
special  nonsymmetric  problem  (5)-(6).  Let  A  be  the  spectral  radius  of  K ,  i.e.  o~{K)  C 
[— iA,iA].  Denote  the  error  by  egk  :=  x9k  —  x*.   Then,  the  even  subsequence  satisfies 

tzr\  \\e2k\\M     .  2 

(h6)  11     11        < 


eo||A/         P    +  P~ 
where  p  =  {\/\  +  \2  +  \)l{\/\  +  A2-l).  The  bound  (86)  also  holds  for  ||e2t+1||M/||ei||M- 

Proof.  By  Theorem  3.3,  e2*  =  {I  +  K){zk  -  z*),  where  (/  +  K)z*  =  x*  and  zk  is  the 
conjugate  gradient  solution  of  (81)  using  the  initial  guess  z0  defined  by  (/  +  K)z0  =  xQ. 
Since  K  is  skew-symmetric  with  respect  to  the  A/-inner  product, 

(e2k,e2k)M    =    (zk-z*,(I-K)(I  +  K)(zk-z*))M  . 

Thus,  by  Theorem  4.1,  the  bound  (86)  holds  for  the  even  case.  The  odd  case  is  com- 
pletely analogous.   □ 

4.2.  The  Other  Estimates.  We  first  consider  the  minimal  residual  method  for 
the  special  nonsymmetric  case  (2),  first  considered  in  Rapoport  [33].  It  is  easy  to  obtain 
an  upper  bound  for  the  error  in  terms  of  a  comparison  polynomial  by  using  the  same 
techniques  as  for  the  standard  case.  The  spectrum  is  confined  to  an  interval  1  +  i[— A,  A]. 
This  polynomial  approximation  problem  has  been  solved  completely  by  Freund  and 
Ruscheweyh  [15].  The  following  bound  is  best  possible  if  we  have  no  information  except 
that  the  spectrum  is  confined  to  this  interval. 

THEOREM  4.3.  Let  {xTk},  A-  =  0,1,.. .,  be  the  minimum  residual  approximations  to 
x*  for  the  special  nonsymmetric  problem  (5)-(6).  Let  A  be  the  spectral  radius  of  K ,  i.e. 
cr(A')  C  [— iA,i'A].   Then,  the  residual  rk  :=  b—  AxTk  satisfies 

IK1U/-'  <       2 


gr  -r  pk  2 ' 


where  p  =  (v/l  +  A2  +  1)/A. 

For  large  values  of  A  this  bound  is  asymptotically  similar  and  as  disappointing  as  the 
bound  in  Theorem  4.2.  For  a  proof  of  Theorem  4.3,  we  refer  to  Freund  and  Ruscheweyh 
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[15].  For  the  more  general  case  of  A  =  e'8(T  +  ial),  0  and  a  real,  T  Hermitian,  Freund 
[14]  gives  useful  upper  and  lower  bounds  for  the  decay  of  the  norm  of  the  residuals.  We 
also  note  that  Widlund  [46]  contains  a  result  on  superlinear  convergence  for  cases  with 
clustered  eigenvalues. 

We  now  turn  to  the  symmetric,  indefinite  case.  We  have  shown  in  Theorem  3.1  that 
one  of  the  sequences  produced  by  the  SYMMLQ  algorithm,  xk,  is  the  minimum  error 
solution  of  the  equation  (1)  and  x^  —  xq  G  Sk(A,  Ar0).  Therefore,  the  error  ek  :=  x^—x* 
satisfies 

eTkek     =         min     (<j>k-i(A)Ar0  +  e0)T{<t>k-\{A)Ar0  +  e0)  = 

(87)  =        min     {Xk-i{A)e0)T(xk-i{A)e0), 

Expanding  e~o  in  the  eigenvectors  of  A,  we  find  that 

-^—  <       mm0        max  Xi_i(cr,). 

Fletcher  [12]  appears  to  have  been  the  first  to  understand  the  relation  between  the 
minimum  error  solution  and  SYMMLQ,  but  he  did  not  use  that  result  to  produce  error 
bounds.  The  analysis  above  of  the  convergence  of  SYMMLQ  was  given  by  Widlund 
[45];  see  also  Stoer  and  Freund  [40],  Szyld  [41]  and  Szyld  and  Widlund  [43]. 

If  A  is  indefinite,  its  spectrum  has  both  positive  and  negative  eigenvalues.  If  we 
only  know  that  o{A)  C  [— /?,  —a]  U  [a,  /?],  j3  >  a  >  0,  we  obtain  the  bound 

INI    <  -       ,  ,, 

— —     <         min        max     \'fc-i  ^) 
||e0||  xfc-ie^a^kl^ 

=     ,    mi^o       2ma^J^-i(T)l- 

where  fr  :=  [(k  +  l)/2]  is  the  integer  part  of  (A:+  1  )/2.  The  last  equality  follows  from  the 
fact  that  the  minimizing  polynomial  must  even,  and  from  a  change  of  variables  r  =  a2. 
Using  Chebyshev  polynomials  as  in  Theorem  4.1,  it  follows  that 

(88)  M<2fK'1 


Co||    ~        \K  +  1 

where  k  =  (3 /a;  see  Freund  [13]  and  Stoer  [39]. 

The  bound  (88)  can  be  very  pessimistic.  A  better  bound  is  obtained  if  we  assume 
that  a(A)  C  {A^  A2,  •  •  • ,  Am}  U  [a, /?],  where  A,,  i  =  1 ,  -  -  • ,  777.  are  negative  eigenvalues 
and  /3  >  a  >  0.  This  is  typical  in  many  applications,  e.g.  for  the  linear  systems 
of  inverse  iteration;  see  Szyld  [41,  42].  The  bound  obtained  below  is  asymptotically 
similar  to  that  of  the  standard  conjugate  gradient  method;  cf.  (84).  This  explains 
the  excellent  convergence  properties  observed  in  practice  for  SYMMLQ.  The  relevant 
quantity  k  :=  0/a  is  sometimes  called  the  reduced  condition  number  of  A. 
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THEOREM   4.4.     Let  {x%},   k  =  0,1,...,   be  the  auxiliary  sequence  of  SYMMLQ 
approximations  to  x* ,  obtained  for  a  symmetric  indefinite  problem.  Let  o~(A)    C 
{Ai,  A2,  •  •  •  i  Am}  U[o,  /?],  where  Aj,  i  =  1,«  •  •  ,m  are  negative  eigenvalues  and  f3  >  a  >  0. 
Then,  for  k  >  m,  the  error  e*  :=  x£  —  x*  satisfies, 

(89)  14  <  «* 


, , e0 1 1  "  /~m  +  />m~fc 

//ere  p  =  (\/k  +  l)/(y/k  —  1),  £  =  /?/a,  arce?  fl*  is  a  linear  function  of  k,  which  is  given 
in  the  proof. 

Proof.  We  use  a  construction  due  to  Freund  [13]  to  obtain  a  comparison  polynomial  in 
-po 

Pm(<7")<7(^)</'*-m(cr)- 

Here  V'*:-m(c)  is  the  normalized  Chebyshev  polynomial  (85)  of  degree  A'  —  m  associated 
with  the  interval  [a,/?],  chosen  so  that  ^._m(0)  =  1.  The  polynomial  pm{cr)  of  degree 
m  vanishes  at  A,    =   0,  i  =  1,-  •  •  ,m  and  pm(0)  =  1,  i.e. 

m 

Pro(a)  =  n(l-T-)- 

7=1  A3 

The  parameter  ^  in  the  linear  function  q(a)  =  1  +  fur  is  chosen  so  that  x'*:+i(0)  =  0' 
which  leads  to  the  formula 

_  "  l        2     r^_m(7) 

''      ,tjA,-  +  09-a)  T,_m(7)  ' 

where  7  =  (/?  +  a)/(/?-a).  It  can  be  shown  that  T'k_m(^)ITk-m{l)  <  {k-m)/y/rr^T. 
Therefore, 

— ^-     <     max  |pm(a)9(a)0fc_m(a)|  =  max  |pm(CT)9(a)0fc_m(a)| 

||eo||  <7«r(.4)  tfc[a,/3] 

2 

<     max  \pn{o-)q{o-)\  max  |t/'m-i;(cr)|  <  flk  — 


CTf[a,/3]  *<[£»,/?]  pk~m  +  p" 


where 


We  remark  that  in  the  case  when  there  are  small  isolated  positive  eigenvalues  the 
bound  (89)  can  be  improved  using  a  similar  comparison  polynomial. 

The  MINRES  algorithm,  also  considered  by  Paige  and  Saunders  [31]  only  generates 
one  sequence  of  approximate  solutions.  Under  the  assumptions  on  the  spectrum  given 
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in  Theorem  4.4,  the  decrease  of  the  norm  of  the  residuals  satisfies  the  bound  given  in 
that  theorem,  where 

n*=n(i-7-) 

is  independent  of  k.  This  follows  from  a  straightforward  argument,  since  the  comparison 
polynomial  need  not  satisfy  the  additional  constraint  of  a  zero  derivative  at  the  origin. 

We  end  this  section  by  mentioning  an  interesting  polynomial  approximation  prob- 
lem related  to  the  study  of  the  minimum  error  solution  for  symmetric  problems,  in  the 
positive  definite  case.  In  a  recent  paper  Freund  identifies  the  optimal  polynomial  in 
terms  of  certain  Zolotarev  polynomials.  The  optimal  polynomial  is  given  in  terms  of  a 
parameter  which  can  be  determined  by  solving  a  certain  nonlinear  equation;  see  Freund 
[14].  These  formulas  are  relatively  complicated  to  use  but  Freund  also  provides  the 
following  bounds  for  the  decrease  of  the  norm  of  the  error  in  the  positive  definite  case: 

2         ^  Ito-rl  <0   (*-l)ft*-i(p)  +  fcMf) 

Pk  +  P~k  -  11*0 -**||  ~     b2k.1(p)  +  (2k-i)b1{Py 

vrhere  p  =  {yfi  +  l)/(yft  -  \)  and  k(p)  =  (l/2)(pl  -  p~l). 
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